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Abstract 


Shape analysis is a promising technique for statically ver- 
ifying and extracting properties of programs that manip- 
ulate complex data structures. We introduce a new char- 
acterization of constraints that arise in parametric shape 
analysis based on manipulation of three-valued structures 
as dataflow facts. 

We identify an interesting syntactic class of first-order 
logic formulas that captures the meaning of three-valued 
structures under concretization. This class is broader than 
previously introduced classes, allowing for a greater flex- 
ibility in the formulation of shape analysis constraints in 
program annotations and internal analysis representations. 
Three-valued structures can be viewed as one possible nor- 
mal form of the formulas in our class. 

Moreover, we characterize the meaning of three-valued 
structures under “tight concretization”. We show that the 
seemingly minor change from concretization to tight con- 
cretization increases the expressive power of three-valued 
structures in such a way that the resulting constraints are 
closed under all boolean operations. We call the resulting 
constraints boolean shape analysis constraints. 

The main technical contribution of this paper is a natu- 
ral syntactic characterization of boolean shape analysis con- 
straints as arbitrary boolean combinations of first-order sen- 
tences of certain form, and an algorithm for transforming 
such boolean combinations into the normal form that corre- 
sponds directly to three-valued structures. 

Our result holds in the presence of arbitrary shape anal- 
ysis instrumentation predicates. The result enables the re- 
duction (without any approximation) of the entailment and 
the equivalence of shape analysis constraints to the satisfia- 
bility of shape analysis constraints. When the satisfiability 
of the constraints is decidable, our result implies that the 
entailment and the equivalence of the constraints are also 
decidable, which enables the use of constraints in a compo- 
sitional shape analysis with a predictable behavior. 
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1 Introduction 


Dynamically Allocated Data Structures Modern 
software is becoming increasingly complex. This complexity 
corresponds to the complex web of relationships between 
different program entities. Object-oriented programming 
languages such as Java use references between objects dy- 
namically allocated in the heap to model the relationships 
between entities of the application domain. Dynamic allo- 
cation of objects provides flexibility that helps applications 
adapt to the dynamically changing environment. To model 
the evolution of the relationships between objects, applica- 
tions perform destructive updates of the heap. Because writ- 
ing applications in this programming model is error-prone, 
tools for statically verifying partial correctness of such pro- 
grams are very valuable. 


Shape Analysis Shape analysis techniques [49], 
can verify and derive precise properties of objects in the 
heap. Shape analysis therefore appears essential for reason- 
ing about programs written in modern imperative program- 
ming languages. Shape analysis is promising as a general- 
purpose verification technique, because of its ability to rea- 
son about graphs as general structures, and the ability to 
summarize properties of unbounded sets of objects. Shape 
analysis such as [49] is effective in deriving program proper- 
ties at each program point and synthesizing loop invariants 
while maintaining high precision and strong soundness guar- 
antees. 


Program Specifications The ability to write program 
specifications can greatly improve the effectiveness of shape 
analysis (and, for that matter, the effectiveness of any static 
or dynamic analysis in general). First of all, specifications 
indicate the desired property to be verified. Next, specifi- 
cations allow the use of assume/guarantee reasoning, which 
improves the scalability of the analysis and enables its ap- 
plication to reusable program components. Finally, if neces- 
sary, specifications can guide the static analysis and provide 
hints for it, while at the same time leaving a documentation 
trace explaining the correctness of the program. 


Analysis-Specification Gap The representation of pro- 
gram properties used by the program analysis is often differ- 
ent from the representation of program properties that is ap- 
propriate for program annotations. To synthesize invariants 
using a fixpoint computation, program analysis often uses a 
finite lattice of program properties. On the other hand, pro- 
gram annotations should be expressed in some convenient, 
well-known notation, such as a variation of first-order logic. 
A program analysis that utilizes program specifications must 
bridge the gap between the analysis representation and the 
program annotations. 


Logic-Based Shape Analysis A promising shape anal- 
ysis approach based on abstract interpretation uses 
the lattice of three-valued logical structures for fixpoint com- 
putation. The fact that the approach is based on logic 
makes bridging the gap between the program annotations 
and the analysis representation much easier, yet it does not 
eliminate it entirely. The original TVLA system [38] uses 
three-valued structures to specify preconditions, which cor- 
responds to specifications with disjoint, non-empty sets of 


objects and is sometimes unnecessarily verbose. The fol- 
lowup work shows how to use arbitrary first-order 
formulas for program annotations and convert the anno- 
tations to three-valued structures using a theorem prover. 
Because the first-order logic is undecidable in general, it is 
interesting to consider alternative approaches with a poten- 
tially more predictable behavior. 


1.1 Contributions 


Mediating the Analysis-Specification Gap This pa- 
per addresses the gap between program annotations and 
three-valued structures by providing an algorithm for trans- 
forming annotations (expressed as formulas) into three- 
valued structures, as well as a way of viewing a class of 
canonical program annotations as three-valued structures. 
Because we restrict our attention to formulas of a particular 
form, we are able to find a complete and sound algorithm 
for generating three-valued structures. The completeness 
makes our algorithm potentially more predictable than the 
use of theorem provers on arbitrary formulas. Our algorithm 
shows that the expressive power of our specifications is equal 
to the expressive power of three-valued structures. Never- 
theless, our specifications may use sets that are potentially 
intersecting or empty, which makes the annotations more 
flexible than three-valued structures themselves where sum- 
mary nodes represent only disjoint sets of nodes. Moreover, 
the characterization of existing shape analysis constraints 
as disjunctive normal forms of formulas suggests that al- 
ternative representations for three-valued structures may be 
possible [42]. 

The characterization of three-valued structures by for- 
mulas allows us to easily prove properties that are less ob- 
vious in the three-valued structure view, such as closure 
of three-valued structures under conjunction. To compute 
the conjunction of three-valued structures, we use the fact 
that three-valued structures correspond to disjunctive nor- 
mal forms of positive boolean combinations of formulas; the 
computation of the conjunction of three-valued structures 
then corresponds to a transformation of a conjunction of 
two disjunctive normal forms into a new disjunctive normal 
form. 


Boolean Shape Analysis Constraints By considering 
the “tight concretization” semantics instead of the con- 
cretization semantics of three-valued structures, we obtain a 
richer class of formulas, namely the class of all boolean com- 
binations of certain atomic formulas. This characterization 
implies that three-valued structures under concretization are 
closed under all boolean operations. We therefore call the 
constraints arising from tight concretization of three-valued 
structures boolean shape analysis constraints. 

Although the notion of tight concretization is not new, 
the characterization of boolean shape analysis constraints 
as boolean combinations of certain formulas is surprisingly 
elegant and has not been observed before. 


Consequences of Boolean Closure The resulting clo- 
sure properties of boolean shape analysis constraints have 
several potential uses. The closure under disjunction is nec- 
essary for fixpoint computation in dataflow analysis and 
can be conveniently computed even for shape analysis con- 
straints; what our results show is that boolean shape anal- 


ysis constraints are also closed under conjunction and nega- 
tion. 

The conjunction of constraints is needed, for example, in 
compositional interprocedural shape analysis, which com- 
putes the relation composition of relations on states. Con- 
junction allows the analysis to simultaneously retain the 
call-site specific information that the callee preserves across 
the call, and the postcondition which summarizes the ac- 
tions of the callee. 

The negation of constraints is useful for expressing de- 
terministic branches in control-flow graphs. For example, an 
if statement with the condition c results in conjoining the 
dataflow fact d to yield dA c in the if branch, and dA -c in 
the else branch. Similarly, the assert(c) statement, which 
is an important mechanism for program specification, has 
(in the relational semantics) the condition —c for the branch 
which leads to an error state. 

Finally, the closure under negation implies that both 
the implication and the equivalence of shape analysis con- 
straints are reducible to the satisfiability of shape analysis 
constraints. This result is in contrast to “regular graph con- 
straints” of [35], which have a decidable satisfiability prob- 
lem but undecidable implication and the equivalence prob- 
lems. The entailment problem is also important for composi- 
tional analysis which uses assume/guarantee reasoning. By 
introducing history variables that store the initial state of 
the program, a compositional interprocedural shape analysis 
can use shape analysis constraints to represent relations on 
program states. The fundamental operations of such compo- 
sitional shape analysis are computation of the best approx- 
imation of relation composition and checking the subset of 
relations. Closure under boolean operations allows reduc- 
ing all these operations to the satisfiability of shape analysis 
constraints. 


Scope of the Result Our result is relevant in the pres- 
ence of shape analysis instrumentation predicates defined 
using arbitrary first-order formulas. What the particular 
choice of instrumentation predicates determines is whether 
the satisfiability problem for boolean shape analysis con- 
straints is decidable. If the satisfiability problem for shape 
analysis constraints with a particular choice of instrumenta- 
tion predicates is decidable, our closure results imply that 
the entailment problem is also decidable, and that the con- 
straints are suitable for use in an instantiation of the shape 
analysis framework. 


Summary of contributions We can summarize the con- 
tributions of this paper as follows: 


1. We give a concrete example that shows how elements 
of the lattice for fixpoint computation can be viewed as 
formulas in a canonical form; we believe that this idea 
is useful in general. 


2. We identify a syntactic class of formulas whose ex- 
pressive power matches exactly the semantics of three- 
valued structures under concretization. The resulting 
constraints are closed under disjunction and conjunc- 
tion, but are not necessarily closed under negation. 


3. We identify a syntactic class of formulas whose ex- 
pressive power matches exactly the semantics of three- 
valued structures under tight concretization. The re- 
sulting boolean shape analysis constraints are closed un- 


der all boolean operations such as disjunction, conjunc- 
tion, negation, implication, and equivalence. 


4. We observe that the closure under all boolean opera- 
tions allows reducing the entailment and the equiva- 
lence problems to the satisfiability problem of boolean 
shape analysis constraints. 


5. We show that each three-valued structure has a model 
within the set of two-valued structures, which means 
that the satisfiability problem of shape analysis con- 
straints is trivial over the set of all two-valued struc- 
tures. 


6. We show that, even in the presence of instrumentation 
predicates, our results allow reducing the entailment 
and the equivalence problems of shape analysis con- 
straints to the satisfiability problem. 


1.2 Organization of the Paper 


The rest of the paper is organized as follows. Section 
reviews the basic notions of two-valued and three-valued 
structures. Section |3] presents a series of syntactic classes 
of formulas of equal expressive power that all characterize 
the meaning of two-valued structures under concretization 
(Corollary BS}. As a consequence, we derive the closure 
of constraints under disjunction and conjunction (Corol- 
lary [29p. Section Bh to some extent a preparation for 
Section Section |4] introduces a series of formulas that 
have the same expressive power ee oe the three- 
valued structures under tight concretization Definition 30), 
and introduces the name boolean shape analysis constraints 
(Definition |46). Section observes that boolean shape 
analysis constraints are closed under all boolean operations 
and derives some consequences of these closure properties. 
Section [4-2] shows that boolean shape analysis constraints 
are the smallest extension of three-valued structures un- 
der concretization which is closed under all boolean opera- 
tions (Proposition [51}. Section [4.3] shows how to transform 
a three-valued structure into a structure where all unary 
predicates have definite values. Section [5] introduces the de- 
cidability problems for three-valued structures, shows that 
every three-valued structure is satisfiable, and derives the 
decidability of the implication and the equivalence as a con- 
sequence of the decidability of satisfiability and the closure 
under boolean operations. Section [6] generalizes the results 
of the previous sections to the case when the values of some 
predicates are constrained by first-order formulas. Section[7] 
presents the related work and Section [8] concludes. 


2 Preliminaries 


In this section we define some preliminary concepts used 
throughout the paper. We mostly follow the setup of 
and for completeness repeat some of the definitions from 

Let A be a finite set of unary relation symbols (with a 
typical element A € A) and F a finite set of binary relation 
symbols (with a element f € F). For simplicity, we consider 
only unary and binary relation symbols because they appear 
to be the most useful cases. Most of our results generalize 
naturally to n-ary relations. 


Two-Valued Structures We next introduce two-valued 
structures. A two-valued structure consists of a domain U? 
and the interpretation 1* of relation symbols. Our language 
does not contain function symbols because we represent all 
functions as relations. In model theory and logic, a two- 
valued structure corresponds to a structure (model) whose 
domain U* is finite. 


Definition 1 A two-valued structure is a pair S* = (U*,1*) 
where U* is a finite non-empty set (of “concrete individu- 
als”), 4(A) € U' — {0,1} for A € A, and 4(f) € (Ut)? s 
{0,1} for f EF. Let 


2-STRUCT = {S" | S? = (U*,14) is a two-valued structure} 


In program analysis, each two-valued structure represents a 
state of the program. The use of structures for representing 
program state has proven useful in the shape analysis [49], 
Abstract State Machines [7], the Alloy modelling language 
and analyzer [27], and relational databases [15]. 


Three-Valued Structures A three-valued structure is a 
model for Kleene’s three-valued logic and differs 
from two-valued structure by the fact that predicates can 
have three-possible values: {0}, {1}, and {0,1}. (The truth- 
values {0}, {1}, {0,1} of three-valued logic are denoted by, 
respectively, 0, 1, 1/2 in [49].) 


Definition 2 A three-valued structure is a pair S = (U, vt) 
where U is a finite non-empty set (of “abstract individuals”), 
(A) € U = {{O}, {1}, {0,1}} for A € A and and uf) € 
U? — {{0}, {1}, (0, 1}} for fe F. Let 


3-STRUCT = {S| S = (U,1) is a three-valued structure} 


The parametric shape analysis framework [49] uses three- 
valued structures to specify sets of two-valued structures 
according to Definition [4] below. 


Formulas We assume the usual syntax and semantics of 
first-order logic. We use an abstract view of the syntax 
of formulas in first-order logic which takes into account as- 
sociativity, commutativity, and idempotence of conjunction 
and disjunction, and the property —7p = p. A conjunction 
with zero conjuncts denotes true; a disjunction with zero 
disjuncts denotes false. 

If S* is a two-valued structure and F a formula with free 
variables 71,...,0%n and ul,..., ub € S*, then e = [v1 
ub, 2.3 Dn bP ui] denotes an environment mapping x; to ub 
for all 1 <i<n, and ({F]*"e) denotes the value vu € {0,1} 
of the formula F’ in the model S* under the environment 
e. Instead of ([F(x)]* [a  u]) we sometimes write S* & 
F(u*) and omit S* if it is understood from the context. If F 
has no free variables we denote the truth value v of F in St 
simply by [F]* and write S# ] F for [F]*° =1. Definition 
below defines the set of models of a formula in the expected 
way. 


Definition 3 (Models of sets of Formulas) Let F' be a 
first-order formula. Then 


yi(F) = {5* € 2-STRUCT | [F]® = 1} 
If C is a set of formulas, define 
models[C] = {yr(F) | F € C} 


The transitive closure operator or inductive definitions 
are useful for describing instrumentation predicates (Sec- 
tion[6), but the presence of such constructs in logic is largely 
orthogonal to the results of this paper. 

For simplicity we treat equality like any other binary 
relation symbol and do not treat summary nodes specially, 
but our results are also useful in the presence of summary 
nodes (see [36], as well as and Section [6p. 


3 Three-Valued Structures with Concretization 


This section uses first-order formulas to characterize the 
meaning of two-valued structures under the usual con- 
cretization function. Section [4] presents an alternative se- 
mantics using tight concretization, which yields constraints 
with better closure properties. 

The following notion of concretization corresponds to 
Definition 3.5]. The concretization function y* provides the 
semantics for sets of three-valued structures. 


Definition 4 (Homomorphism and Concretization) 
Let S* = (U*,.*) be a two-valued structure, S = (U,t) a 
three-valued structure, and h : U4 — U a surjective total 
function. We write S*C" S, iff 


1. for every AGC A andu ce U: 
u(A)(u) D {v*(A)(u*) | A(u*) = u} 
2. for every f © F and ui,u2 € U: 


uf )(ur, ua) Df RCf)(ur*, wa*) | 
h(w*) = ur A h(uz*) = u2} 


We write S* C S iff there exists a surjective total function 
h such that S* C" §. We call any such h homomorphism 
from S* to S. The concretization of a three-valued structure 
S, denoted y(S), is given by: 


¥(S) = {s"| 5*C Ss} 


We extend y to y* acting on sets of three-valued structures 
so that the set denotes a disjunction: 


SES 


The function h from Definition [4] is called “embedding” in 
49]. (We choose to call h “homomorphism” because in lit- 
erature the term “embedding” sometimes implies injectivity 
whereas in shape analysis h is not required to be injective, 
and almost never is injective.) 


Bounded Structures Each set of three-valued structures 
S specifies a set of heaps 7*(S). Each such set y*(S) 
is definable as the set of models of a formula in existen- 
tial monadic second-order logic; the second-order existen- 
tial quantification arises from the existential quantification 
over the homomorphisms h. Constraints that involve unre- 
stricted second-order existential quantifications have several 
undesirable properties [35] [34]. We therefore restrict our at- 
tention to bounded structures, where the homomorphism h is 
determined as the natural map associated with the partition 


of the elements of U* according to the values some chosen 
finite set of predicates. 

For the purpose of this paper, we define bounded struc- 
tures as follows. Let Ai C A be a finite subset of unary 
predicates. We call elements of Ai abstraction predicates. 


Definition 5 (Bounded Structure) We say that a 
three-valued structure S = (U,t) is Ai-bounded iff both of 
the following two conditions hold: 


1. (A)(u) € {{0}, {1}} for all A € Ai and allu € U; 


2. if ui,u2 € U and ur # ue then u(A)(u1) F L(A) (u2) for 
some AE Aj. 


Definition 6 (Concretization Definability) The set of 
sets of heaps definable via three-valued structures with con- 
cretization is defined by: 


models[Ti] = {y*(S) | S a finite set of A,-bounded 
three-valued structures } 


Note that we use the same notation models[X] when X de- 
notes a set of structures (Definition|6) and when X denotes 
a set of formulas (Definition 3). There is no confusion be- 
cause we use distinct names for sets of structures and sets 
of formulas. 

We proceed to characterize the set models[T{] as the set 
of models of formulas of a certain form. 

We define the notion of a cube first. 


Definition 7 (Exponent Notation) [f A € A anda € 
{0,1} then A® is defined by A' =A and A° =7W7A. 


Definition 8 (Cube) A cube over A, (or just “cube” for 
short) is an expression P(x) of the form 


Aft (a) A... A AG? (2) 


where a1,...,Aq € {0, Ll}. 
R,-literals are the building blocks for formulas used to 
form constraints that characterize models[T\]. 


Definition 9 (R,-literal) Let P,(x),Pe(y) range over 
cubes over Ai, let A range over elements of A\ Ai, and 
let f range over F. 

An R,-literal is a formula of one of the following forms: 


da. P; (x) node present 
adr. P(x) node absent 
ada. P(x) A A(x) property does not hold 
ada. P(x) \ 7A(a) property holds 
adady. Pi(x) A Po(y) A f(a, y) no edge 
adedy. P(x) A Poly) Aaf (2, y) must edge 


We first introduce the class of Ri-formulas that satisfy 
syntactic invariants that make them isomorphic to three- 
valued structures. 


Definition 10 (Ri-formulas) Let P(x), Pi(x), P2(y) de- 
note cubes over A;. A canonical conjunction of Ri literals is 
a conjunction of R-literals that satisfies the following con- 
ditions: 


1. for each P(x) a cube over Ai, exactly one of the con- 
juncts dx.P(x) and 73x.P(x) occurs in the conjunc- 
tion; 


2. there is at least one cube P(x) such that the conjunct 
dx.P(x) occurs in the conjunction; 


3. if the conjunct ~Ax.P(x) occurs, then this conjunct is 
the only occurrence of the cube P(x) (and the cube 
P(y)) in the conjunction; 


4. for each cube P(x), and A € A\ Aj, at most one of 
the conjuncts 4x.P(x) A A(x) and 7Ax%.P(x) \ aA(x) 
occur; 


5. for every two cubes Pi(x) and P2(y), at most one of 
the conjuncts 


maavdy. P, (a) A P2(y) x f(x,y) 


and 


aJady. Pi(x) A P2(y) \ af (x,y) 


Occurs. 


Define an Ri-formula as any disjunction of canonical con- 
junctions of R,-literals. 


In Definition [10] and throughout the paper, the symbol R; 
alone denotes Rj-formulas, so models[R] is the set of all 
models of all Ri-formulas (as opposed to, for example, the 
set of models of all R,-literals). 

The following Proposition shows that three-valued 
structures and R,-formulas define same sets of two-valued 
structures. The proof of Proposition [1] is straightforward 
because the set of R; formulas was chosen to facilitate the 
proof. The proof shows that there is a semantic-preserving 
bijection between three-valued structures and canonical con- 
junctions of R,-literals. 


Proposition 11 models[Ri] = models[T;] 


Proof. The idea of the proof is the following. Each 
bounded three-valued structure can be represented as a 
canonical conjunction of R;-literals, and each canonical con- 
junction of R,-literals can be represented as a bounded 
three-valued structure. Therefore, disjunctions of canonical 
conjunctions of Rj-literals correspond to sets of bounded 
three-valued structures. 

We next give a function 4 mapping each bounded three- 
valued structure S to a canonical conjunction of R,-literals 
u(S). We show that S and p(.S) represent same set of two- 
valued structures. Moreover, each canonical conjunction of 
R,-literals is equal to y(S') for some three-valued structure 
S. 

Let S' = (U,v) be an A;-bounded three-valued structure. 
Define the formula j(S) as the conjunction of the following 
R,-literals. 

Define first, for each u € U, acube over A; corresponding 
to u, denoted 7(u)(«), by 


r(uj(z)= f\ AM (a) 


AEA, 


where &. Sea 
2 -{ if (A)(u) = {1} 
0, if c(A)(u) = {0} 


a is well-defined because (A) € {{0}, {1}} for A € Ai. We 
next introduce the R,-literals. 


Node existence. For each u € U, introduce an Ry-literal 


x.m(u)(z) (1) 


For each remaining A,-cube P(x), that is, for each cube 
P(a) such that 7(u)(x) 4 P(x) for all u € U, introduce an 
R,-literal 


LW 


adx.P(x) (2) 


Node properties. Let u€ U and AE A\ Ax. If o(A)(u) = 
{1}, introduce the R,-literal 


adr. 1(u)(x) A 7A(ax) (3) 
If .(A)(u) = {0}, introduce the literal 


sax. 1(u)(x) A A(x) (4) 


If .(A)(u) = {0,1}, we do introduce no conjuncts. 
Edges. Let ui,u2 € U (we allow ui = u2) and let f € F. If 
L(f)(u1, uz) = {1}, introduce the must-edge R1-literal 


Sey. m(ur)(x) A m(u2)(y) A mf (x,y) (5) 
If u(f) (ui, u2) = {0}, introduce the no-edge R1-literal 


aaady. m(ur)(x) A m(u2)(y) A f(@,y) (6) 


If e(f)(w) = {0,1}, we introduce no conjuncts. 
a4 Ba, - as the conjunction of all formulas 
f}. @ MG} introduced as described above. 
Ve next § G. ay : = 7*(S). In both directions, we es- 
las the following ae of the homomorphism h from 
S* to S: 


A(u") =u iff S*K r(u)(u*) (7) 
Direction yp(u(S)) D 7*(S). Let S* € y*(S). Then S* LC” 
S for some homomorphism h. We establish that (7) holds 
for h. For A € Ai, we have {v*(u#)(A)} = (u)(A), so 
K A*4) (ut). Therefore, K m(u)(u*), which establishes (7). 

We next show S* / C for each conjunct C of ju(S). 

1) Consider C = Az.7(u)(x) for some u. Because h is a 
surjection, h(u*) = u for some u*, so K 1(u)(u*), and KC. 

2) Consider C = 74x.P(x) for the cube P(x) distinct 
from all cubes 7(u)(x). Consider any u? € U*. Then - 
m(h(u®))((#u)), and P(x) and 1(h(u*)) (2) are distinct cubes, 
so — — P(u®). Therefore, K C. 

3) Consider C = 73 z.1(u)(x) AA A(a ) for some A € A\ 
Ay. Then 1(A)(u) = {1}. Consider any u*. If = - r(u)(u*), 
then clearly E C. If K m(u)(u*), then h(u®) = u by ( tn and 
because h is a homomorphism, 1#(u*) = 1, so = - 7A(u*), 
so again EF C. 

4) Consider C = 732.7(u)(2)A A(x) for some A € A\A1. 
Analogously to the previous case, (A re ) = {0}. Consider 
any u®. If K m(u)(u®), then h(u®) = u, and because h is a 
homomorphism, 1#(u*) = 0, so 4 & Ala) and thus — C. 

5) Consider C = 7Ardy.n(ui)(x) A m(u2)(y) A af (2, y). 
Then 1(f)(u1,u2) = {1}. Consider any ui*,u2* € U*. If 
3 — m(u1)(u1") or = - m(u2)(u2#), then K C. Sup- 
pose  m(u1)(w1*) and — a(u1)(u1*). Then h(wi#) = ur 
and h(uz*) = u2 by a) and h is a homomorphism so 
(f)(ur*, ue#) = 1. Then = K ~f(ui#, u2") so EC. 

6) Consider s3rdy.7(ur)(x) A m(u2)(y) A f(a, y). Anal- 
ogously to the previous case, u(f)(ui,u2) = 0; for any 
ust,ust € U4, if K m(ui)(ui®) and - m(u1)(u1*) then 


h(ui#) = usr and h(uz*) = ue, so eA(f)(ur*,us*) = 0, 
KE f(ur?, tat) 60 EC 

Direction yp(u(S)) C y*(S). Let S# € y8(u(S)), then all 
conjuncts of (S$) are true in S*. We show that S? £" § 
where h is defined in the following way. Consider any u’ € 
U*. There is exactly one cube C(x) such that - C(u*). 
Moreover, because j1(S) contains -4z.P(«) for cubes P(«) 
other than m(u)(x), the cube C(x) is of the form 7(u)(«) for 
some u € U. Define h(u*) = u. This defines the function 
h. By construction, holds. Furthermore, h is surjective: 
for each u € U, the conjunct Jx.7(u)(x) is in w(S), so there 
exists u® such that m(u)(u*) and thus h(u*) = u. We next 
show that h is a homomorphism. 

1) Let us show 


{o'(A)(u*) | h(u®) = u} C e(A)(u) 


for all A € Aand for all u € U. Consider A € A; and u* such 
that h(u*) = u. Then - m(u)(u4), so EK A*“ (u®), which 
implies 2*(A)(u*) € 1(A)(u). Next, consider A € A \ At. 
If 4(A)(u) = {0,1} the property trivially holds. Consider 
w(A)(u) = {1}. Then >32.7(u)(a@) A A(x) occurs in p(S). 
Therefore, if h(u“) = u, then / A(u*), otherwise the con- 
junct would be false. Therefore, 1#(u*) = 1 € 1(A)(u). The 
case 1(A)(u) = {0} is analogous: 43x.7(u)(x) A A(x) occurs 
in (S), so if A(u®) = u then E A(u*), and si(uk) = 0 € 
L(A)(w). 


2) Let us show 


{8 Ae uat) | Mot) = ws Aunt = ta} C AN), 
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for all f € F and ui,u2 € U. This is similar to 1). If 
u(f) = {0,1}, the inclusion trivially holds. Consider the 
case U(f)(u1,u2) = {1}. Then ~Sxdy.7(u1)(x) A m(ua2)(y) A 
af(a,y) occurs in p(S). Suppose that A(ui#) = ui and 
h(us#) = uz. Then  m(ui)(ui#) and - m(uz2)(u2"), so 
E f(u1,u2) as well, otherwise the conjunct would be false. 
Therefore, u* (u1®, u2*) = 1 and the inclusion holds. The 
case U(f)(ui,u2) = {0} is analogous: aAardy.r(ui)(x) A 
m(u2)(y) A f(x,y) occurs in u(S), so if h(ui#) = uw and 
h(us*) = us, then E m(u1)(ui#) and K m(u2)(u2") so 
a & f(ui,u2) and v*(f)(ui*,u2#) = 0. The inclusion 
holds, and S* C” §. 

Because every structure S has a corresponding equiva- 
lent formula (S$), we conclude models[T;] C models[Ri]. To 
conclude models[T;] > models[R:], we show that ju is surjec- 
tive: every canonical conjunction F' of Ri-formulas is equal 
to 4(S) for some structure S. 

Let F' be a canonical conjunction of R,-literals. For each 
cube P(x) such that dx.P(x) occurs in F, let upc.) be a dis- 
tinct element. Let U be the set of all such elements upiq). 
Property 2 of Definition ensures that U is non-empty. 
Let a be such that P(x) = Ayca, A(a)*), Then define 
(A) (up(x)) = {a(A)} for all A € Ai. For A € A\A1, define 
L(A) (up(a)) as {1} if —dx.P(x)A-A(a) occurs in F, as {0} if 
adx.P(x) A A(x) occurs in F’, and as {0,1} otherwise. Such 
definition of i(up(x))(A) is possible because of the Prop- 
erty 4 of Definition Analogously, using Property 5 of 
Definition for each f € F, define i(f) (up, (2), UPs(x)) as 
{1} if sday-Pi(x) A Po(y) A af (x,y) occurs in F’, as {0} if 
adey.Pi(x)A Po(y)A f(x,y) occurs in F’, and as {0,1} other- 
wise. Let S = (U,v). To show F' = p(S), recall first that we 


use an abstract view of the syntax that takes into account as- 
sociativity, commutativity and idempotence of conjunction. 
It therefore suffices to show that F and p(S) contain the 
same set of conjuncts. It is easy to see that each conjunct 
of (5) occurs in F’. The converse is also straightforward by 
Definition 

We conclude that p is surjective, and models[Ri] C 
models[7;], which completes the proof. 

Although this fact is not needed for the proof, we remark 
that pw is also injective, so js is, in fact, a bijection between 
the set 3-STRUCT and the set of canonical conjunctions of 
R,-literals. = 


We proceed to show that a syntactically richer class of 
formulas defines the same set of constraints as R,-formulas. 


Definition 12 (Rj-formulas) An R2-formula is a dis- 
junction of (not necessarily canonical) conjunctions of Ri- 
literals. 


The proof of the following Lemma provides a nor- 
malization algorithm that converts every conjunction of R1- 
literals into an equivalent disjunction of canonical conjunc- 
tions of R-literals. 


Lemma 13 Each conjunction of R1-literals can be written 
as an equivalent Ry -formula. 


Proof. Consider an arbitrary, not-necessarily canonical, 
conjunction F of R,-literals. We show how to transform F 
into an equivalent disjunction of canonical R-literals. The 
idea is to transform each conjunction into a disjunction of 
multiple conjunctions to ensure that all properties in Defini- 
tion|10}are satisfied. We perform the following transforma- 
tions as long as some property of Definition [10] is violated. 
Property 1. If both Sx.P(x) and 7dx.P(x) occur, use the 
rule Q@ A =Q — false and eliminate the entire conjunction 
from the disjunction of conjunctions. If none of 3x.P(#) and 
adx.P(x) occur, use the rule true > QV 7Q to introduce 
the missing P(x), and then distribute the disjunction to the 
top level of the formula. 

Property 2. First ensure that Property 1 holds. If the result- 
ing conjunction contains no conjuncts of the form J2.P(z), 
then the conjunction contains a conjunct 732.P(x) for ev- 
ery P(x) a cube over A;. Therefore, the entire conjunction 
is false and can be eliminated from the disjunction of con- 
junctions. 

Property 3. First ensure that Property 1 holds. Then, if the 
literal —4x.P(x) occurs in the conjunction, remove from the 
conjunction all literals containing P(«). Such literals are of 
the form 73a%.P(x) A Fi(x) for some Fi (x), ~dvdy.P(x) A 
F\(x,y), for some F(x,y), or aAady.P(y) A Fi(x,y), for 
some F(x, y); all these literals are implied by ~3x.P(a) so 
removing them yields an equivalent formula. 

Property 4. If both conjuncts ~dx.P(x) A A(x) and 
ad2.P(x) \ aA(x) occur, replace them with the equivalent 
conjunct 7J2.P(z). 

Property 5. If both 


“Fry. P(x) A Pe(y) A f(a,y) 


and 


adady. Pi(x) A Poly) Aaf (x,y) 


occur, replace them with 


(442.Pi(x)) V (>dy.Pe(y)), 


then propagate the disjunction to the top level of the for- 
mula. = 


Lemma{I3]implies that R2-formulas, although a syntacti- 
cally a richer class, are no more expressive than R,-formulas, 
hence Corollary [T4] 


Corollary 14 models[R2] = models[R:] 


Proof. models[Ri] C models[R2] because Re is a richer 
class of formulas. Conversely, let S' € models Re]. Then 
S' = y§(F) for some Ro-formula F. By ae let F’ be 
an R,-formula obtained by transforming conjunctions of F’ 
into disjunctions of canonical conjunctions of Rj-literals. F’ 
is an R, formula equivalent to F. Therefore, S* = yp(F’), 
and S# € models[Ri]. = 


Definition 15 (Positive Boolean Combination) /f 
B(pi,..-;Pn) is a formula built from pi,...,pn using 
A,V,7, we say that p; (for 1 <%< n) occurs positively in 
B(pi,..-,;Dn) iff pi occurs under an even number of 7 signs. 
We say that B(pi,...,pn) is a positive boolean combination 
iff each of pi,...,Pn occur positively in B(pi,...,pn)- 


Definition 16 (R3-formulas) An R3-formula is a positive 
boolean combination of R,-literals. 


Lemma [17] states that R2-formulas are simply the dis- 
junctive normal forms of R3-formulas. 


Lemma 17 Every R3-formula is equivalent to an Ro- 
formula. 


Proof. Let F be an R3-formula. Then the disjunctive 
normal form of F' is an Ro-formula. m 


Corollary 18 models[R3] = models[R2] 


Proof. By Lemma|I7] 7 


In the sequel we observe that replacing cubes over A; in 
the definition of R1-literals with boolean combinations over 
A, does not change the set of expressible sets of two-valued 
structures. Definition [19] generalizes Definition [9] 


Definition 19 (R4-literals) Let Bi(x), Ba(y) range over 
arbitrary boolean combinations of elements of A1, let Q(x) 
range over disjunctions of literals of the form A(x) and the 
form 7A(a) for A€ A\ Ai, and let g(x,y) range over dis- 
junctions of literals of the form f(x,y) and af(x,y) where 
feF: 


An Ra-literal is a formula of one of the following forms: 
1. dx.Bi(x) 
2. 7da.Bi(x) A Q(a2) 
3. sdxdy.Bi(x) A Ba(y) A g(x,y) 


EL 


Definition 20 An R4-formula is a positive boolean combi- 
nation of R4-literals. 


Lemma 21 models[R4] = models[R3] 


w.Bi(x) V Bo(a) — (dx.Bi(x)) V (Ax. Ba(2)) 
7dx.Bi(@) V Bo(x) — (73ax.Bi(«)) A (74x.B2(x)) 
ade. (Bi(x) V Ba(x)) A Q(x) 
(+3¢.Bi(2) A Q(a)) A (“30.Ba(e) A Q(a)) 
ade. Bi(x) A (Qi(@) V Qa(x)) > 
(s32.Bi(x) A Qi(a)) A (32.Bi (x) A Q2(x)) 
7drdy. (Bii(@) V Bia(x)) A Bo(y) A g(2,y) > 
aaxdy. Bii(x) \ Bo(y) A g(x,y) A 
aAxdy. Bi2(x) A Ba(y) A g(a, y) 
(Bai (y) V Ba2(y)) 
) 
) 


WwW 


Ag 
Ag 


dry. Bi(x) A A g(x,y) > 
adardy. Bi(x) A Baily) A g(x,y) A 
7drdy. Bi(x) \ Boaly) A g(a, y) 

dry. Bi(x) A Bo(y) A (gi(x,y) V ga(x, y)) > 
adedy. Bi(x) A Bay) A gi(z,y) 
7Ardy. Bi(x) \ Ba(y) A g2(x,y) 


Figure 1: Transforming R.-literals into R,-literals 


Proof. Note that a formula of the form 73a.B, (2) is equiv- 
alent to the formula 73%.Bi(x)A (A(x) V7A(x)), which is of 
the form 732.Bi(x) A Q(x). Therefore, R4 is a richer class 
than Rs, so models[Ra] D models[R3]. To show the con- 
verse, we transform each F.-literal into a positive boolean 
combination of R,-literals. 

First, transform each boolean combination B(#) (and 
B(y)) of Ai predicates into canonical disjunctive normal 
form, so that each B(x) is a disjunction of cubes. Then 
apply rules in Figure [1] to decompose Fz.-literals into R1- 
literals. = 


By eliminating the top-level negation from R4-literals we 
obtain Rs-literals, which use universal quantifiers. 


Definition 22 (R5-literal) Let B,(x), Bo(y) be variables 
denoting arbitrary boolean combinations of elements of A1, 
let Qp(x) denote conjunctions of literals of the form A(«) 
and of the form 7A(x) for A € A\ At, and let gp(z,y) 
denote conjunctions of literals of the form f(x,y) and of the 


form af(x,y) for f € F. 


An Rs-literal is a formula of one of the following forms: 
1. Ax.Bi(x) 
2. Va. Bi(x) > Qp(a) 
3. Vay. Bi(x) A Bo(y) > gp(z, y) 


wi 


Definition 23 An Rs-formula is a positive boolean combi- 
nation of Rs5-literals. 


Lemma 24 models[Rs] = models[R4] 


Proof. Yzx. Bi(a) > Qp(x) corresponds to 74x.Bi(x) A 
Q(x) with Qp = 7Q, whereas VaVy. Bi(x) A Bo(y) > 
gp(x,y) corresponds to a=dady.By(x) A Bo(y) A g(x,y) with 
gp=7g. 


In the end we introduce R¢-formulas. Like heap abstrac- 
tions based on may-edges, R¢g-formulas implicitly indicate 
the absence of edges by specifying the set of possible end- 
points for each edge. 


Definition 25 (Re-literals) Let Bi(x), Bo(y) denote arbi- 
trary boolean combinations of elements of A1, let Qp(x) de- 
note conjunctions of literals A(x) and =A(x) for A € A\ Ai, 
and let f denote elements of F. 

An Re-literal is a formula of one of the following forms: 


WwW 


1. dx.Bi(x) (node existence) 

2. Va. Bi(x) > Qp(x) (node properties) 

3. Vay. Bi(x) A f(x,y) => Ba(y) (may-edges) 
4. Vay. Bi(x) A Bo(y) > f(x,y) (must-edges) 


Definition 26 An Re-formula is a positive boolean combi- 
nation of Re-literals. 


Lemma 27 models[R¢] = models[R5] 

Proof. Observe first that the may-edge literal 
Vay. Bi(a) A f(x,y) > Bo(y) 

is equivalent to 


VaVy. Bi(x) A ABo(y) > f(a, y) (9) 


which is an R5-literal. Conversely, every Rs literal can be 
shown to be equivalent to a conjunction of may-edge and 
must-edge Re¢-literals using the transformation 


VaVy. Bi(x) A >Ba(y) => 91(z,y) A g2(2,y) > 
VaVy. Bi(a) \7Bo(y) > g(x,y) A (10) 
VaVy. Bi(x) A >Bo(y) > go(2,y) 


The following Corollary [28|summarizes the results on dif- 
ferent representations of constraints corresponding to three- 
valued structures. 

Corollary 28 


models[7)] = 
models[.Ri] = models[R2] = models[R3] = 
models[R4] = models[R5] = models[Re] 


3.1 Closure under Disjunction and Conjunction 


By definition, the syntactic class of R3-formulas is closed un- 
der disjunction and conjunction. As the Corollary [29] below 
observes, this provides a way to compute the (disjunction 
and) conjunction of three-valued structures. 


Corollary 29 The family of sets models[T\] is closed under 
union and intersection. 


Proof. The closure under union is trivial because union 
of sets of three-valued structures corresponds to the union 
of their models. For the closure under intersection, consider 
two sets of three-valued structures S; and S2. Let Fi, be an 
Rs formula such that yp (F1) = y* (Si) and F2 an R3 formula 
such that yp(F2) = y*(S2). Then Fi A F2 is also an Rg3 for- 
mula, and the set of three-valued structures corresponding 
to F, A F> denotes the desired intersection. m 


4 Three-Valued Structures with Tight Concretiza- 
tion 


This section examines the constraints that arise from the 
meaning of sets of three-valued structures under tight con- 
cretization. These constraints are slightly more expressive 
than constraints in Section | as Section [4.2] shows. Inter- 
estingly, the added expressive power is just enough to make 
the constraints in oS section closed under all boolean op- 
erations (Section [4-1) . These closure properties are in con- 
trast to the sropertics of constraints in Section 3} which 
are closed only under union and intersection. The closure 
under boolean operations allows, for example, reducing the 
implication of constraints to the satisfiability of constraints. 

The structure of this section mirrors the structure of Sec- 
tion We start by defining the interpretation of three- 
valued structures under tight concretization. 

The following definition corresponds to [48} Definition 
3.6], Chapter 7]. Compared to our poate Sec- 
tion }3| the only difference is the use of “=” instead of “D” 
in the condition on 1. on (A) and the condition 2. on u(f). 


Definition 30 (Tight Concretization) Let S* - 
(U*,.#) be a two-valued structure, let S = (U,t) be a 


three-valued structure, and let h : U* — U be a surjective 
total function. We write S* Ch § iff 


1. for every AGC A andu€ U: 
u(A)(u) = {18(A)(u*) | h(ut) = u} 
2. for every f © F and ui,u2 € U: 


u(f)(ur,u2) = { F(f)(ur*, wa*) | 
h(ur*) =u; A h(ug*) = ug} 


We write S*C Er S iff there exists a surjective total function 
h such that S* CS, and in that case we call h a homomor- 
phism. The tight concretization of a three-valued structure 
S, is given by: 


yr(S) = {S*| S' Cr 3} 


We extend yr to yp that acts on sets of three-valued struc- 
tures so that the set denotes a disjunction: 


= VU rr(5) 


SES 


Definition 31 (Tight Concretization Definability) 
The set of sets of two-valued structures definable via 
three-valued structure with tight concretization is defined by: 


models[T2] = {yr(S) | S a finite set of A1-bounded 
three-valued structures} 


TR,-literals are used to build formulas that characterize 
models[7>]. 


Definition 32 (TR-literal) Let P,(x), P2(x) range over 
cubes over Aj, let A range over elements of A\ Ai, and let 


f range over F. A TR,-atomic-formula is a formula of one 
of the following forms: 


ae. Pi (x) | 
dx. Py(x) A A(z) | 
da. Py(x) A “ae ) | 
ary. Pi(a ) A Pa(y) A f( x +Y) | 
dedy. Pi(x) A Po(y) Aaf(@,y) | 


A TR,-literal is a TRi-atomic-formula or its negation. 


TR,-formulas satisfy syntactic invariants that make 
them isomorphic to three-valued structures under tight con- 
cretization. 


Definition 33 (TR1-formulas) Let P(x), Pi(x), P2(y) 
denote cubes over A,. A canonical conjunction of TR, lit- 
erals is a conjunction of TR,-literals that satisfies the fol- 
lowing conditions. 


1. for each P(x) a cube over Ai, exactly one of the con- 
juncts dx.P(x) and ~dx.P(x) occurs; 


2. there is at least one cube P(x) such that the conjunct 
dx.P(x) occurs in the conjunction; 


3. if the conjunct ~Ax.P(x) occurs, then this conjunct is 
the only occurrence of the cube P(x) (and the cube 
P(y)) in the conjunction; 


4. for each cube P(x) and A€ A\ Ai, exactly one of the 
following three conditions holds: 
(a) 7Ax%. P(x) \ A(x) occurs in the conjunction, 
(b) ada. P(x) \7A(x) occurs in the conjunction, 


(c) both sx. P(x) A(x) and dx. P(x) \7A(x) occur 
in the conjunction; 


ly WwW 


5. for every two cubes Pi(x) and P2(y) and every f € F, 
exactly one one of the following three conditions holds: 


(a) ~Saxdy. Pi (x) A Pa(y) A f(x,y) occurs in the con- 
junction; 

(b) 7Aaxay. Pi (x) A Pa(y)A7f (x,y) occurs in the con- 
junction; 

(c) both Awdy. Pi(x) A Poly) A f(x,y) and 

Ardy. Pi(x) A Po(y) A af(x,y) occur in the 

conjunction. 


A TR,-formula is a disjunction of canonical conjunctions of 
TR,-literals. 


The following Proposition [34] shows that TR, formulas 
capture precisely the meaning of three-valued structures un- 
der tight concretization. The proof of Proposition B4]is sim- 
ilar to the proof of Proposition [I] 11} and is similarly straight- 
forward. 


Proposition 34 models[TRi] = models[T2] 


Proof. The idea of this proof is similar to the idea of the 
proof of Proposition[1]] the meaning of each bounded three- 
valued structure under the tight concretization is equal to 
the meaning of some canonical conjunction of TR1-formulas, 
and conversely. Therefore, disjunctions of canonical con- 
junctions of TR,-literals correspond to sets of bounded 
three-valued structures under the tight concretization. 


We next give a function 4 mapping each bounded three- 
valued structure S to a canonical conjunction of T'R,-literals 
LS). We show that S under tight concretization and pu(S) 
represent the same set of two-valued structures. Moreover, 
LE is surjective. 

Let S = (U,v) be an A;-bounded three-valued structure. 
Define the formula pu(S) as the conjunction of the following 
TR,-literals. Define 7 as in the proof of Proposition [11] 
Node existence. For each u € U, introduce the T'R1-literal 


x.m(u)(2) (11) 


For each remaining A,-cube P(x), that is, for each cube 
P(«) such that m(u)(x) #4 P(x) for all u € U, introduce the 
TR,-literal 


LW 


adx.P(x) (12) 


Node properties. Let u€ U and AE A\ Aj. If o(A)(u) = 
{1}, introduce the TR,-literal 


ada. m(u)(x) A AA(a) (13) 
If .(A)(u) = {0}, introduce the literal 
ade. (u)(x) A A(x) (14) 


If o(A)(u) = {0,1}, 


introduce the following two T'R,-literals 


LW 


x. T(u)(x) A A(a) 
x. T(u)(x) A AA(ax) 


(15) 


Ww 


Edges. Let ui,u2 € U (we allow ui = u2) and let f € F. If 
L(f)(ui, u2) = {1}, introduce the TR-literal 

Sey. m(ur)(x) A m(u2)(y) A af (@, y) (16) 
If u(f)(w1, u2) = {0}, introduce the TR,-literal 

Tay. m(ur)(x) A m(u2)(y) A f(@,y) (17) 


If c(f)(u1,u2) = {0,1}, introduce the following two TR1- 
literals 


1)(2) Aru hae y) (18) 
1)(@) A m(u2)(y) A >F(@, 9) 


as the con a of all formulas 
ca 5) (16), ( Pa A ), Eo a. as 
we Pe a) next show A . Asin 
the proof of Proposition [17] we far awd Gs use the 
property for a homomorphism h from S* to S. 
Direction yp(u(S)) D y*(S). Let S* € 7*(S). ae Stch 
S for some homomorphism h. We establish that (7) holds 
for h. For A € Ai, we have {1#(u#)(A)} = u(u)(A , Then 
kK A*4) (u#). Therefore, the conjunction / 7(u)(u*), which 
establishes (7. We next show S# / C for each conjunct C 
of u(S). 
1) Consider C = Az.7(u)(x) for some u. Because h is a 
surjection, h(u*) = u for some u*, so K m(u)(u*), and the 
conjunct - C 
2) Consider C = 742x.P(x) for the cube P(x) distinct 
from all cubes 1(u)(x). Consider any u? € U*. Then - 
m(h(u*)((#u)), and P(x) and r(h(u*))(a) are distinct cubes, 
L P(u®). Therefore, K C. 


daxdy. r(u 


dardy. r(u 


qm. (2) ‘t. € 


so 7 
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3) Consider C = 74ax.m(u)(x) A 7A(x) for some A € 
A\ Ai. This means that 1(A)(u) = {1}. Consider any u'. 
If = — r(u)(u4), then E C. If K m(u)(u*), then h(ut) = u 
by th, and because A is a homomorphism, 1'(u#) = 1, so 
ak A A(u*). Hence E C. 

4) Consider >Ax.1(u)(x) A A(x) for some A € A\ Aj. 
Analogously to the previous case, 4(A)(u) = {0}. Consider 
any u'. If m(u)(u'), then A(u#) = u, and because h is a 
homomorphism, 1*(u*) = 0, s0 — K A(u*). Hence K C 

5) Consider conjuncts i) These conjuncts occur only 
when 1(A)(u) = {0,1}. By the definition of tight concretiza- 
tion, there exists u? € U* such that h(u*) = (uw) and 
u#(A)(u#) = 1. By property of h, — m(u)(u*) and thus 
E da.m(u)(x) A A(x,y). Analogously, by the definition of 
tight concretization there exist v* € U* such that h(v') = u, 
and u#(A)(v#) = 0, so K da. m(u)(2) AA(z). 

6) Consider C = 7Axdy.m(ui)(x) A m(u2)(y) A af (a, y)- 
Then 1(f)(w1,u2) = {1}. Consider any w*,ug* € U4. If 
= & m(u1)(u1*) or = — m(u1)(ui#), we have E C. Sup- 
pose m(u1)(u1*) and m(u2)(u2*). Then h(ui’) = ui and 
h(ug*) = ua; h is a homomorphism so t*(f)(ui*, ua) = 1, 
aK af(ur#,ue#) and EC. 

7) Consider C = 73ardy.m(ur)(x) A m(u2)(y) A f(x,y). 
Analogously to the previous case, u(f)(wi,u2) = 0; for any 
ub € U4, if K m(ur)(ur") and — m(u2)(u2*), then A(ui*) = 
U1 and h(ug*) = U2, SO uF (f)(u1*, u2*) => 0, qe f (ur*, uo!) 


and — C 
8) Consider conjuncts These conjuncts occur 
By the definition of 


Cs. 
only when i(f)(u1,u2) = 40,1}. 
tight concretization, there exist ui,ui © U* such that 
h(uh) = ws, h(ug) = ua, and u(f)(uj,u) = 1. By prop- 
erty (7) of h, - m(ui)(u¥) and —- m(u2)(ud), and thus 
E day. m(u1)(x ) A m(u2)(y) A f(x,y). Analogously, by the 
definition of tight concretization, there exist vi, vi e UI 
such that h(v') = uw, h(v4) = ue, and v*(f)(vi, v4) = 0, so 
E Say. r(ui)(x) A r(wa)(y) A fla, 4). 
Direction yp(u(S)) C y*(S). Let S*# € y8(()u(S)), then all 
conjuncts of (S$) are true in S*. We show that S? £" 9 
where A is defined in the same way as in the proof of the 
Proposition|11| re (7) 7) holds and is surjective. We show that 
the homom ism conditions of Definition 30] are satisfied 
for h. 

1) Let us show 


{(A)(u) | hu’) = 


for all A € A and for all u € U. Consider A € A, and 
any u® such that h(u*) = u. Then m(u)(u*), so AX (v4), 
which implies {c*(A)(u*)} = 1(A)(u). Moreover, because 
da.m(u)(x) holds, the left-hand side side is a non-empty set, 
so it is equal to 1(A)(u). 

Next, consider A € A\ A. Consider first the case 
w(A)(u) = {1}. Then 732.7(u)(a%) A A(x) occurs in p(S). 
Therefore, if h(u*) = u, then A(u*) holds, otherwise the con- 
junct would be false. Therefore, 1?(u*) = 1. The left-hand 
side is non-empty so the equality holds. 

The case 1(A)(u) = {0} is analogous: 7dax.m(u)(x)A A(x) 
occurs in (S$), so if h(u®) = u then A(u*) holds, so v#(u#) = 
0 and the left-hand side is non-empty so the equality holds. 

If .(A)(u) = {0,1} then the conjuncts hold. There- 
fore, there exists a node u* € U* such that h(u#) = u and 


u4(A)(u*) = 1, and there exists a node v' € U* such that 
h(v') = wu and 1#(A)(u#) = 0. The left hand-side is a set 
containing both 0 and 1, so it is equal to {0, 1}. 

2) Let us show 


for all f € F and wi, u2 € U. This is similar to 1). 
Consider first the case i(f)(ui,u2) = {1}. Then 
ad edy.m(u1)(«) A m(u2)(y) A af (x,y) occurs in (S$). Sup- 
pose that h(ur*) =u and uo! = ua. Then m(u1)(ur*) and 
m(u2)(u2*) hold, so f(w1,u2) must hold as well, otherwise 
the conjunct would be false. Therefore, uF (ur®, w2*) = 1. 
Moreover, because of the conjunct 3x.7(ui)(ax) and the con- 
junct dx.7(u2)(2), the left-hand side is non-empty, so it is 


equal to {1}. 

The case w(f)(ui,u2) = {0} is analogous: 
adedy.n(ui)(~) A m(ua)(y) A f(x,y) occurs in p(S), 
so if h(u1*) = ur and u2* = ue, then m(ur)(u1*) and 
m(u2)(u2") so f(u1, uz) is false and v#(u1*,u2#) = 0. More- 
over, because of the conjunct Jx.7(u1)(x) and the conjunct 
da.m(u2)(x), the left-hand side is non-empty, so it is equal 
to {0}. 

Finally, consider the case 4(A)(u) = {0,1}. Then the 
conjuncts (18) hold in S*. Because the first conjunct holds, 
there exist ui and ud such that h(ut) = uw, h(ud) = us 
hold and and tu(f)(u4, u8) = 1. Therefore, 1 belongs to the 
left-hand side of Similarly, because the second conjunct 
holds, 0 belongs to the right-hand side of of [I9| There- 
fore holds. 

We conclude that S# C% §. Because every struc- 
ture S has a corresponding equivalent formula p(S), we 
have models[Z2] C models[T'R1]. To conclude models[Z2] > 
models[7'Ri], we show that jy is surjective. 

Let F be a canonical conjunction of TR,-literals. For 
each cube P(x) such that dx.P(x) occurs in F’, let upi,) be 
a distinct element. Let U be the set of all such elements 
Up(«)- Property 2 of Definition [33] ensures that U is non- 
empty. Let a be such that P(z) = Aycy, A(a)*), Then 
define 1(A)(up(z)) = {a(A)} for all A € Ay. For A€ A\ Ai, 
define 1(A)(up(2)) as {1} if sdx.P(x) A A(x) occurs in F, 
as {0} if sda.P(x) A A(x) occurs in F’, and as {0,1} other- 
wise. Such definition of 1(up;,))(A) is possible because of the 
Property 4 of Definition [33] Analogously, using Property 5 
of Definition for each f € F, define 1(f) (wp, (2), UPs(2)) 
as {1} if ~dry-Pi(x) A Pa(y) A af(x,y) occurs in F, as {0} 
if sday.Pi(x) A Po(y) A f(x,y) occurs in F’, and as {0,1} 
otherwise. Let S = (U,t). To show F = p(S), it suffices to 
show that F' and y(S) contain the same set of conjuncts. It 
is easy to see that each conjunct of (5) occurs in F. The 
converse is also straightforward by Definition [33] 

We conclude that ju is surjective, and models[Z2] D 
models| 7'Ri], which completes the proof. 

It is easy to see that p is, in fact, a bijection between 
the set 3-STRUCT and the set of canonical conjunctions of 
TR,-literals. m 


As in Section [3] we proceed to show that we can permit 
a richer syntactic structure without changing the expressive 
power of constraints. 


Definition 35 (TR2-formulas) A TR2-formula is a dis- 
junction of conjunctions of TR,-literals. 
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The following Lemma [36] is analogous to Lemma [13] it 
shows that any conjunction of TR literals can be trans- 
formed into an equivalent disjunction of canonical conjunc- 
tions of TR, literals. 


Lemma 36 Each conjunction of TR1-literals can be written 
as an equivalent TR, -formula. 


Proof. Consider an arbitrary, not-necessarily canonical, 
conjunction F' of TR,-literals. We show how to transform 
F into an equivalent disjunction of canonical conjunctions 
of TR,-literals. The idea is to transform conjunctions into 
disjunctions of multiple conjunctions to ensure that all prop- 
erties in the Definition [33] are satisfied. We perform the fol- 
lowing transformations as long as any of the properties in 
Definition |33] is violated. 

Property 1. If both dx.P(ax) and ~3x.P(x) occur, the entire 
conjunction is false and we eliminate it from the disjunction 
of conjunctions. If none of the 3x.P(x) and ~dz.P(x) use 
the rule true > (Sx.P(x)) V (=da.P(x)) and then distribute 
the disjunction to the top level of the formula. 

Property 2. First ensure that Property 1 holds. If the result- 
ing conjunction contains no conjuncts of the form Jxr.P(x), 
then the conjunction contains a conjunct 732.P(«) for ev- 
ery P(x) a cube over A;. Therefore, the entire conjunction 
is false and can be eliminated from the disjunction of con- 
junctions. 

Property 3. Suppose that the literal ~dxz.P(x) occurs in 
the conjunction. If the conjunction contains a literal of 
one of the forms 3x.P(x) A Q(x), Srdy.P(x) A Q(x, y), or 
dxrdy.P(y) A Q(«,y), then the entire conjunction is contra- 
dictory and may be omitted from the disjunction of conjunc- 
tions. If there are no such conjunctions, then (as in the proof 
of Lemma|13) remove all , literals of forms 73x.P(x) AQ(z), 
adedy.P(£) A Q(a,y), and a=dardy.P(y) A Q(x, y). because 
they are implied by —3x.P(zx). 

Property 4. If both a literal and its negation occur in the 
conjunction, the entire conjunction is false. Hence, we can 
assume that (a) and (c) do not occur simultaneously and (6) 
and (c) do not occur simultaneously. To ensure that (a) and 
(b) do not occur simultaneously, use the replacement rule 


(-3e.P(2) A A(x)) A (-3e.P(x) A A(2)) > —3e.P(2) 


and then ensure again the Property 3. We have thus shown 
how to ensure that no two of the cases (a), (b), (c) hold 
simultaneously. To ensure that at least one of the cases (a), 
(b), (c), holds, use the fact that ~pV-qV (pAq) is a tautology, 
and apply the rule 


true— (7de¢.P(x) A A(x)) V 
(n32@.P(x%) A aA(x)) V 
P(«) A A(a)) A (A2.P(x) A aA(a)) 


( 
) 


— 
Lu 


Then propagate the disjunction to the top level of the for- 
mula. Then ensure that no two cases apply simultaneously, 
as described previously. 

Property 5. Ensuring Property 5 is analogous to ensuring 
Property 4. If both a literal and its negation occur in the 
conjunction, the entire conjunction is false. Hence, we can 
assume that (a) and (c) do not occur simultaneously and (6) 
and (c) do not occur simultaneously. To ensure that (a) and 


(b) do not occur simultaneously, use the replacement rule 


(sSrdy-Pi(x) A Poly) A f(@,y)) A 
(s4rdy-Pi(z) A Poly) Aaf(@,¥)) > 
(442.P1(x)) A (m4y-Pa(y)) 


and then ensure again Property 3. We have thus shown 
how to ensure that no two of the cases (a), (b), (c) hold 
simultaneously. To ensure that at least one of the cases (a), 
(b), (c), holds, use the fact that ~apV-qV (pq) is a tautology, 
and apply the rule 


J 
8 
ff 


3y.Pi(a) A f(x,y)) V 

y-Pi(x) A Poly) A >f(,y)) V 
dxdy.Pi(x) A f(a,y)) A 

dady.Pi(x) \ Po(y) A af (a, y)) 

Then propagate the disjunction to the top level of the for- 


mula. Then ensure that no two cases apply simultaneously, 
as described previously. m= 


J 
ul 
8 
ul 


Corollary 37 models[TR2] = models[TR1] 


Proof. Every TR,-formula is a TR2-formula. Conversely, 
let F be a TRo-formula. Then F is a disjunction of conjunc- 
tions of TR. By Lemma [36] transform each conjunction of 
F into a disjunction of canonical conjunctions of TR, liter- 
als. The result is a T7'Ri-formula. = 


TR3-formulas remove the disjunctive normal form re- 
quirement on TR2-formulas. 


Definition 38 (TR3-formulas) TR3-formula is a boolean 
combination of TR-atomic-formulas. 


TR2-formulas are the disjunctive normal forms of TR3- 
formulas. 


Lemma 39 Every TR3 formula is equivalent to a TR2 for- 
mula. 


Proof. Let F be a TR3 formula. Then the disjunctive 
normal form of F' is a TR2 formula. = 


Corollary 40 models[TR3] = models[{TR2] 


Proof. Every TR2 formula is a TR3 formula, so 
models[7'R3] D models[T'R2]. The converse models[TR3] C 
models[7T'R2] follows from Lemma [39] 7 


Analogously to R4 formulas in Section [] we introduce 
TR, formulas that allow using boolean combinations of more 
complex atomic formulas. 


Definition 41 (TR4-formulas) Let By(x), Bo(y) be range 
over arbitrary boolean combinations of elements of Ai, let 
Q(x) range over disjunctions of literals of form A(x) and 
aA(x) where A € A\ Ai, and let g(x,y) range over dis- 
junctions of literals of the form f(x,y) and af (x,y) where 
f-eF: 

A TR,4-atomic-formula is a formula of one of the follow- 
ing forms: 


1. dx. Bi (x) 
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x.By(x) V Bo(x) — (Ax.Bi(x)) V 

da. (Bi(x) V Ba(x)) A Q(xa) 
(dx.Bi(x2) A Q(x)) V (Ax. Ba(x) A Q(2)) 

Bi(x) A (Qi(x) V Qa(@)) > 

da.Bi(x) A Q1(x)) V (Ax.Bi(ax) A Q2(x)) 


(Ax. B2(x)) 


de. 


— 
LL 


viy. (Bir(x) V Bi2(x)) A Baly) A g(z,y) > 
dxdy. Bii(x) \ Bo(y) A g(x,y) V 
Axdy. Bi2(x) A Bo(y) A g(z,y) 

Ardy. Bi(x) A (Bai(y) V Boa(y)) A g(@,y) > 
Ardy. Bi(x) \ Baily) A g(z,y) V 
dxdy. Bi(x) \ Boa(y) A g(x,y) 

drdy. Bi(x) A Ba(y) A (g1(@,y) V g2(2,y)) > 
Axdy. Bi(x) \ Ba(y) A gi(z,y) A 
dxdy. Bi(x) \ Ba(y) A g2(2,y) 


Figure 2: Transforming TR.-literals into TR,-literals 


LW 


2. da. By(x) A Q(x) 
8. Avdy. Bi(x) A Ba(y) A g(x,y) 


A TR4-literal is a TR,4-atomic-formula or its negation. 
A TR,4-formula is a boolean combination of TR4-atomic- 
formulas. 


Lemma 42 models[TR4] = models[TR3] 


Proof. Each formula of the form 742.B,(x) is equivalent 
to the formula 742.B,(x) A (A(x) V ~A(a)), which is of the 
form 74x.Bi(#)A\Q(x). Therefore, TRa is a richer class than 
TR3, so models[T'R4] D models[ TR3]. To show the converse, 
transform each T'R,4-literal into a boolean combination of 
TR,-literals. 

First, transform each boolean combination B(x) (and 
B(y)) of Ai predicates into canonical disjunctive normal 
form, so that each B(x) is a disjunction of cubes. Then 
apply rules in Figure[2|to decompose TR,4-literals into TR,- 
literals. = 


Instead of existential quantifiers, we may use atomic for- 
mulas that contain universal quantifiers. 


Definition 43 (TR;-formulas) Let B,(x), Bo(y) denote 
arbitrary boolean combinations of elements of A1, let Qp(zx) 
denote conjunctions of literals of the form A(x) and =A(a) 
for A € A\ Ai, and let gp(x,y) denote conjunctions of 
literals f(x,y) and af(x,y) for f € F. 

An TR5-atomic-formula is a formula of one of the fol- 
lowing forms: 


1. Vx.Bi(x) 
2. Vx.Bi(x) > Qp(z) 
3. Vx. Bi(x) > Vy. Bo(y) > gp(2,y) 


A TR;-formula is a boolean combination of TRs-atomic- 
formulas. 


TRs — atomic formula 


TR, — formula 


Va. By (x) 
Va. By (az) > (Li(ax ra AN L;,(2)) 
Va. Bi(x) > Vy. Bo(y) > 


mA 


Iy(a,y) A... L(x, y)) 


7dr. 7 By (x) _ 
naz. Bi(x) A (Li(a) V... V Le(a) 
adardy. Bi(x) A Bo(y) A (Li(a,y) V... Lx (2, y)) 


Figure 3: Mapping TRs-atomic-formulas to TR4-formulas 


TR, — atomic formula 


TR; — formula 


x. By (x) 
A (Li(a,y) A... A Le(a,y)) 


dy. Bi(x) A Bo(y) 


Lu 
Ut 


“Wea.7Bi (2) = 
We. By(x) A (Ei(2) A... A Lp(a) 


“Wa. Bi(x) > Vy. Bo(y) > (Li(z,y) V...V D(a, y)) 


Figure 4: Mapping TR,-atomic-formulas to TR5-formulas 


Lemma 44 models[TRs] = models[TRa] 


Proof. For each TRs-atomic-formula there exists a 
corresponding equivalent TR.4-formula, and for each TRa.- 
atomic-formula there exists a corresponding equivalent T'Rs5- 
formula. 

The mapping from TJRs5-atomic-formulas to TRa- 
formulas is in Figure[3| the mapping of TR4-atomic-formulas 
to TR5-formulas is in Figure|4| We use the notation L to 
denote Li if L is of the form —L, for some Li, and AL if L 
is not of the form —L, for some Lj. = 


The following Corollary [45|summarizes the results on dif- 
ferent representations of constraints corresponding to three- 
valued structures with tight concretization. 


Corollary 45 


models[Z2] = models[ T'R1] = models[TR2] = 
models| TR3] = models[ TRa] = models[ TR5] 


Definition 46 (Boolean Shape Analysis Constraints) 
We call the set of sets models[T2] boolean shape analysis 
constraints. 


4.1 Closure under Boolean Operations 


By definition, T’R3-formulas are closed under all boolean 
operations. 


Corollary 47 The family of sets models[T2] forms a 
boolean algebra of sets which is a subalgebra of the boolean 
algebra of all subsets of 2-STRUCT. 


As an example consequence of closure under all boolean 
set operations we obtain the following proposition. 


Proposition 48 There is an algorithm that constructs, 
given two finite sets of bounded three-valued structures S1 
and So, a finite set of bounded three-valued structures S3 
such that: 


y7(S1) Cyr(S2) iff yr(Ss) =0 


Similarly, the equivalence of two three-valued structures re- 
duces to the satisfiability. 
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Proposition 49 There is an algorithm that constructs, 
given two finite sets of bounded three-valued structures S1 
and So, a finite set of bounded three-valued structures S3 
such that: 


yr(S1) = yr(S2) if yr(Ss) =9 


4.2 Relationship with Non-Tight Concretization 


In Proposition [50] below we observe that three-valued struc- 
tures with tight concretization (Definition |30) are at least 
as expressive as three-valued structures with concretization 
(Definition [4). 


Proposition 50 models[T2] > models[T‘]. 


Proof. By definition, every R4-formula is a TR4-formula, 
so models| T'R4] D models[R4]. Therefore, 


models[Z2] = models[ T'R4] D models[R4] = models[T}] 


Proposition implies that, even if we work with the 
interpretation of three-valued structures under concretiza- 
tion, we can convert three-valued structures into boolean 
shape analysis constraints and check for entailment or equiv- 
alence of the original constraints via satisfiability of the, 
richer, boolean shape analysis constraints. In fact, the 
Proposition below shows that boolean shape analysis 
constraints models[{T>2] are the smallest extension of the con- 
straints models[T\] which have this desirable property. 


Proposition 51 models[T2] is the smallest superset of 
models[7)] that is closed under all boolean operations. 
Proof. models[72] 2 models[Ti] by Proposition 
and models[72] is closed under all boolean operations by 
Corollary Because models[T>] models[7R4] and 
models[7] = models[R4], it remains to show that every TR4 
formula is (equivalent to) a boolean combination of some 
R,-literals. By definition, T’R4-formulas are boolean combi- 
nations of T’R4 atomic formulas, so it suffices to show that 
each TR, atomic formula is a boolean combination of Ry 
literals. That is certainly true, in fact, it suffices to use at 
most one negation of an Ry, literal to obtain any TR, literal. 
7 


4.3 Node Splitting 


Given a three-valued structure S = (U,z), it is desirable if 
L(A)(u) € {{0}, {1}} for allu € U and A € A. This property 
holds if all predicates are abstraction predicates, that is, if 
Ai = A. The following Proposition [52] shows that we can 
always assume that A; = A if the syntactic class of formulas 
is sufficiently rich. 


Proposition 52 Every TRa-formula with the set of ab- 
straction predicates A, C A is also a TR formula with 
the set of abstraction predicates A, = A. 


Proof. Observe that, in the atomic formula F\(x) = 
da. Bi(x) A Q(x) of the Definition the subformula 
Bo(x) = Bi(x) A Q(x) is a boolean combination of predi- 
cates from A, so F(x) is of the form 4x.B2(x) for a boolean 
combination of predicates from A. = 


Note that the converse of Proposition [52]is not true. For 
example, if A, A’ € A\ Ai then the property ~dx. A(x) A 
4A’(x), which correlates two non-abstraction predicates is a 
TRa-formula with the set of abstraction predicates A, but is 
not equivalent to any TR.-formula with the set of abstrac- 
tion predicates Aj. 


Definition 53 (Split Form) Let Fi, be a TRa-formula 
with the set of abstraction predicates A; C A. By Propo- 
sition [52] and Corollary [45 let Fo be a TRi-formula with 
the set of abstraction predicates A such that F2 is equivalent 
to F\. We call F> the split form of Ff. 


Letting Ai; = Ain Definition [41] we obtain the following 
Corollary 


Corollary 54 (Split Form Formulas) The set of split 
forms of TRa formulas is precisely the set of boolean combi- 
nations of formulas of the form 


1. dx. Bi(x) 


a 


2. Ardy. Bi(x) A Ba(y) A g(a, y) 


where Bi(x), Bo(y) are boolean combinations of literals of 
the form A(x) and A(y) for A€ A, and g(x,y) ranges over 
disjunctions of literals of the form f(x,y) and af (x,y) for 
feF: 


5 Decidability of Independent Predicates 


In this section we present decidability results for constraints 
expressed by three-valued structures under tight concretiza- 
tion. We show that satisfiability, entailment and equiva- 
lence of boolean shape analysis constraints are all decidable. 
Boolean shape analysis constraints (TR1-formulas) are more 
expressive than R,-formulas by Proposition [50] so we obtain 
decidability results for R1-formulas as well. 


Formulation of Decidability Problems We assume fi- 
nite sets A and F of predicates. As a result, the num- 
ber of non-isomorphic bounded three-valued structures, and, 
therefore, the number of non-equivalent Ri-formulas, is fi- 
nite. Therefore, for fixed A and F, a problem of the form: 


Given a TR -formula F’, is F' satisfiable? 
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is essentially finite and therefore trivially decidable. How- 
ever, we are interested in having a single algorithm that 
would give decidability for any number of unary and binary 
predicates. Therefore, the size of sets A and F is part of 
the input to the decision procedure we are looking for. For 
example, we are interested in the questions of the form: 


Given sets A and F and a TRi-formula F' over 
predicates A and fF, is F satisfiable? 


In this section we study such decidability questions for in- 
dependent predicates, when the three-valued structures are 
interpreted over the entire set 2-STRUCT. Section [6-4] ad- 
dresses the more general case where some of the predicates 
are defined using first-order formulas, which means that for- 
mulas are interpreted over a subset of 2-STRUCT. 

Satisfiability of TR1-formulas over 2-STRUCT is decid- 
able. In fact, the proof of the following Lemma [55] shows 
that every disjunct of a T’R1-formula has a small model in 
2-STRUCT. 


Lemma 55 Let F be a canonical conjunction of TR1- 
literals and let the number of cubes P(x) over Ai such that 
dax.P(x) occurs in F be n. Then there exists a two-valued 
structure S* = (U*,v") such that |U*| = 2n and F is true in 
St 


Proof. Let S = (U,v) be the structure that corresponds 
to F' by the proof of Proposition [34| Let U = {u1,..., un}. 
Define S* = (U#,1#) as follows. Let Ut = {u4,ub,..., ub, }. 
In the sequel we define z* so that S* C*. S$ where h is given 
by 


A(uj;—1) = Ui 
h(uj;) = Ui 


for 1 <i<n. By definition, h is surjective. 
Define 1(A) for A € A, as follows. Let 1 <i <n. Then 
(ui) = {0} or e(us) = {1}. If e(ui) = {0}, define 


L(A) Cae = 


If c(ui) = {1}, define 


'(A)(uj,) = 0 


(A) (ub,_y) = o8(A)(ub,) = 1 


Define 1#(A) for A € A\ Ai as follows. Let 1 <i <n. If 
(ui) = {0}, define 


(A) (ub,_1) = o(A)(ud,) = 0 
If c(ui) = {1}, define 
(A) (wha) = 4(A)(uh,) = 1 


If c(ui) = {0, 1}, define 


Define 1'(f) for f € F as follows. Let 1 < i,j <n. If 
Uf) (ui, uj) = {0}, define 


H(f)(uj, uj) = 0 


for k € {2t—1, 2i} and! € {2j7—1, 27}. Ife(f) (wi, us) = {1}, 


define ‘ i 
MPa ot) =1 


for k € {2i—1,2i} and] € {27 — 1,27}. If o(f) (wi, uz) = 
{0,1}, define 


for k € {2i — 1, 2%}. 
It is straightforward to show S# 


Cc. S. Therefore, F 
holds in S*. a 


Corollary 56 y7(S) =0 iff S=9. 
Proof. By Lemma[55] 7 


We note that the construction of the model in Lemma[55] 
becomes even simpler if we assume that the formula F' is in 
the split form. Corollary [56] then follows from the observa- 
tion that if F has at least one disjunct then split form of F 
has at least one disjunct. 


Corollary 57 The following questions are decidable for 
sets S1,S2 of three-valued structures: 


1. y7(S1) © y7r(S2); 
2. yp(S1) = yr(S2). 


Proof. By Corollary Proposition and Proposi- 
tion [49] = 


6 Structures with Defined Predicates 


In this section we introduce the notion of a three-valued 
structure with defined predicates. Previous sections in- 
terpret three-valued structures and formulas over the set 
2-STRUCT of all two-valued structures. In general, it is 
useful to interpret three-valued structures and formulas over 
some subset 2-CSTRUCT C 2-STRUCT of compatible two- 
valued structures Page 268]. 


6.1 Compatible Structures 


We view structures with defined predicates as a way of defin- 
ing a subset 2-CSTRUCT C 2-STRUCT. 


Definition 58 (Compatible Structures) Let A2 C A 
be a set of defined unary predicates and Fz C F be a 
set of defined binary predicates. Let 2-“SEM-STRUCT C 
2-STRUCT be the set of two-valued structures that satisfy 
the constraints of the semantics of the programming lan- 
guage. Next, for each A € Az, and each two-valued structure 
S* € 2-STRUCT where S# = (U4, 14), let da(S*) : U! — 
{0,1} be a unary predicate. For each f € Az, and each two- 
valued structure S* € 2-STRUCT where S* = (U*,1*), let 


dy(S*) : (U*)? {0,1} be a binary predicate. Define 
2-CSTRUCT = {S# = (u#, o#) | 

S# € 2-SEM-STRUCT A 

A Vul €U*. o#(A)(u#) = da(S#) (ut) A 
AE Ag 

A. Vua', ust € UF. o(f)(ur®, w2*) = d¢(S*)(ur?, ua")} 
fEF2 

(20) 
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Definition [59] below introduces tight concretization with 
respect to compatible structures, in the natural way. 


Definition 59 (Compatible Tight Concretization) If 
S C 3-STRUCT is a set of three-valued structures, define 


cyr(S) = yr(S) NM 2-CSTRUCT 


We then use cyp to define the class of definable sets 
models[cT2]. With the results of Section [4.3] in mind, we 
let Ai = A. 


Definition 60 The set of sets of compatible two-valued 
structures definable via three-valued structure with tight con- 
cretization is defined by: 


models[cT2] = {cyr(S)|S a finite set of A1-bounded 
three-valued structures} 


Lemma 61 
models[cT>] = {S* 9 2-CSTRUCT | S* € models[T2]} 
Proof. Immediate by Definition [60] and Definition [59] 7 


6.2. Formulas for Compatible Structures 


In Section [4] we have characterized sets of two-valued struc- 
tures using formulas. We now characterize sets of compatible 
two-valued structures by conjoining the formulas with the 
compatibility formula. 


Definition 62 (Compatibility Formula) Let wo be a 
sentence that axiomatizes the set 2-SEM-STRUCT, so that: 


models[{ao}] = 2-SEM-STRUCT 


Let the value of each predicate da(S*) for A € Az be equal 
to the Tarskian semantics of some formula wa(x) in the 
structure S*: 


da(S*)(ut) = [va(@))™ [2 6 ub] 


and let the value of each predicate ds(S") for f € Fo be equal 
to the Tarskian semantics of some formula w+(x,y) in the 
structure S*: 


dy (S*)(ur#, ua") = [ba(a)) [2 6 wt,y 6 us!) 


Define the compatibility formula Fy by: 


Fy = wor 
\ Va. A(x) = > a(x) A 
AEA 
A Vey. f(a,y) => vs(2,y) 
fEFo 


For each class of formulas T'R; we introduce the corre- 
sponding class cT'R; by conjoining the formulas with Fy. 


Definition 63 (Formulas for Compatible Structures) 
For each i where 1 <i <5, let the set of cTR; formulas be 
the set of all formulas B A Fy for B a TR; formula. 


Lemma [64] below shows that compatibility formula de- 
fines precisely the subset of compatible two-valued struc- 
tures. 


Lemma 64 (Compatibility Formula is Correct) 


2-CSTRUCT = {S" ¢ 2-STRUCT | [Fy]® = 1} 
Proof. Immediate by Definition [58] and Definition [62] 7 


As a result, we obtain the following characterization of 
the constraints expressible using cTR; formulas. 


Lemma 65 For each i where 1<i< 5, 
models[cTR;] = {S* 9 2-CSTRUCT | S* € models[TRi]} 


Proof. By Definition [63] and Lemma [64] 7 


The following Corollary [66] states the desired correspon- 
dence between formulas and three-valued structures with 
defined predicates. 


Corollary 66 
models[cT2] = models[cTR1] = models[cTR2| = 
models[cT'R3] = models[cT'Ra] = models[cTRs] 
Proof. From Lemma|61] Lemma 65} and Corollary [45] rT] 


6.3. Closure under Boolean Operations 


We next show that, even in the presence of defined predi- 
cates, we can reduce the entailment and the equivalence of 
constraints to the satisfiability problem. This results fol- 
lows from the closure under boolean operations. The results 
below generalize the results of Section [4.1] 


Corollary 67 The family of sets models[cT2] forms a 
boolean algebra of sets which is a subalgebra of the boolean 
algebra of all subsets of 2-CSTRUCT. 


Proof. From Lemma|61] Lemma 65} and Corollary [47] rT] 


Proposition 68 There is an algorithm that constructs, 
given two finite sets of three-valued structures S; and S2, 
a finite set of three-valued structures S3 such that: 


ceyr(Si) € eyr(S2) iff cyr(Ss) = 0 


Proposition 69 There is an algorithm that constructs, 
given two finite sets of three-valued structures S; and S2, 
a finite set of three-valued structures S3 such that: 


cyr (Si) = cyr(S2) iff cyr(Ss) = 9 


6.4 Decidability Properties 


The following conditional result generalizes the idea of Sec- 


tion 


Corollary 70 Let S,S1,S2 range over finite sets of three- 
valued structures with defined predicates. Assume that the 
question cyp(S) = @ is decidable. Then the following ques- 
tions are decidable as well: 


1. eyr(S1) € eyr(S2); 
2. cyp(S1) = eyr(S2). 
Proof. By Proposition [68] and Proposition [69] : 


We present an example of constraints for which the sat- 
isfiability question is decidable in [36]; other examples of 
decidable constraints can be formulated based on the tech- 
niques of logic L, of or based on monadic second-order 
logic of trees which is in the heart of the graph types ap- 
proach [43} [29} [30] [31] [17 [28}. 
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7 Related Work 


A parametric framework for shape analysis is presented in 
[49]. A systematic presentation of three-valued logic with 
equality is given in [44]. A description of three-valued logic 
analyzer is in [38], an extension to interprocedural analysis is 
in and the use of shape analysis for program verification 
is demonstrated in [39]. Other shape analysis techniques 
include [37] [25} {16} [21] [20 [43] (33). 

Our paper presents a contribution to the characteriza- 
tion of heap summaries by formulas, which is a promis- 
ing direction of shape analysis that has been initiated in 
[34]. Shape analysis constraints differ from reg- 
ular graph constraints [35] [34] because shape analysis con- 
straints characterize sets of objects by defining predicates, 
instead of using existential quantification over sets of ob- 
jects. Logic LZ, in allows specifying reachability proper- 
ties between local variables and is therefore appropriate for 
expressing certain classes of shape graphs. What L, does 
not allow is defining a set of nodes A using some predicate 
and then stating further properties of objects in the set A, 
which is one of the main expressive features of three-valued 
structures. 

Our work follows the line of shape analysis approaches 
which view program as transforming concrete graph struc- 
tures [49] [37] (25) [2] [20] [43] (33). An alternative approach 
is to identify each heap object using the set of paths that 
lead to the object [16] (23) [8]. Other notations for reasoning 
about the heap include spatial logic and alias 
types [50] [51). 

It is possible to apply predicate abstraction techniques 
[3] [2] [22] to perform shape analysis; the view of three-valued 
structures as boolean combinations of constraints of certain 
form may be beneficial for this direction of work and enable 
easier application of representations such as binary decision 
diagrams [42]. 

A shape analysis tool must ultimately take into account 
the definitions of instrumentation predicates, which requires 
some form of theorem proving or decision procedures. 
Page 272] uses rules based on Horn clauses for such reason- 
ing, whereas proposes the use of theorem provers. In 
this paper we have identified one component of the problem 
that is always decidable and useful: it is always possible to 
reduce entailment and equivalence problems to the satisfi- 
ability problem. In [36], we report a concrete example of 
constraints for which the satisfiability is decidable, the re- 
sults in the present paper then imply that the entailment 
and the equivalence are decidable as well. 

Researchers have proposed several program checking 
techniques based on dataflow analysis, symbolic execution, 
and abstract interpretation [18] [24] [19] [9] [12] (6) [40]. The 
primary strength of the shape analysis approach compared 
to the alternative approaches is the ability to perform sound 
and precise reasoning about dynamically allocated data 
structures. 

The boolean algebra of state predicates and predicate 
transformers has been used successfully as the foundation 
of refinement calculus [I]. In this paper we have identified 
a particular subalgebra of the boolean algebra of all state 
predicates; we view this boolean algebra as providing the 
foundation of shape analysis. 


8 Conclusions 


We have characterized constraints used as dataflow facts of 
parametric shape analysis based on three-valued logic. Our 
characterization represents these dataflow facts as boolean 
combinations of formulas. The usual concretization seman- 
tics yields only positive boolean combinations. On the other 
hand, the tight concretization yields boolean shape analy- 
sis constraints, which are closed under all boolean combi- 
nations. Among the useful consequences of the closure of 
boolean shape analysis constraints under all boolean opera- 
tions is the fact that the entailment and the equivalence of 
constraints is reducible to the satisfiability of constraints. 

We view the results of this paper as a step in further un- 
derstanding of the foundations of shape analysis. To make 
the connection with [49], this paper starts with three-valued 
structures and proceeds to characterize the structures using 
formulas. An alternative approach is to start with canonical 
formulas that express the desired properties and then ex- 
plore efficient ways of representing and manipulating these 
formulas. We believe that the entire framework can 
be reformulated using canonical forms of formulas instead 
of three-valued structures. We also expect that the idea 
of viewing dataflow facts as canonical forms of formulas is 
methodologically useful in general, especially for the analy- 
ses that verify complex program properties. 
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